{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Tutorial: Plugging User-Designed Methods into DANCE 2.0 for Auto-Search\n",
    "\n",
    "In this notebook, we'll walk through how to integrate a new algorithm (specifically an SVM classifier) into the auto-search framework outlined in your documentation. We will:\n",
    "\n",
    "1. Inherit from the **BaseClassificationMethod** (or another suitable base) to define our custom method. Implement the required interfaces (`fit`, `predict`, and optionally `preprocessing_pipeline`).  \n",
    "2. Show how to run the hyperparameter search using the integrated method.  \n",
    "3. Provide an example `main.py`-like script that demonstrates how the auto-search process is orchestrated.\n",
    "\n",
    "## 1. Folder Structure & Requirements\n",
    "\n",
    "Before diving in, ensure you have the following directory structure (at least conceptually; your actual project structure can be more extensive):"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "```\n",
    "examples/tuning/\n",
    "└── classification_svm/\n",
    "    ├── main.py\n",
    "    ├── tutorial.ipynb  \n",
    "    └── dataset_name/\n",
    "        ├── pipeline_params_tuning_config.yaml\n",
    "        └── config_yamls/\n",
    "            ├── 0_test_acc_params_tuning_config.yaml\n",
    "            ├── 1_test_acc_params_tuning_config.yaml\n",
    "            └── 2_test_acc_params_tuning_config.yaml\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Where `cta_svm` is the directory we created for our new algorithm. The same pattern can apply for other methods, such as `clustering_kmeans`, `regression_linreg`, etc.\n",
    "\n",
    "We'll focus on the **SVM** example below.\n",
    "\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Defining Our SVM Classifier\n",
    "\n",
    "Suppose we want to define a custom SVM method for classification.\n",
    "We'll inherit from BaseClassificationMethod and implement the required methods."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/home/zyxing/dance/dance/utils/matrix.py:178: NumbaExperimentalFeatureWarning: First-class function type feature is experimental\n",
      "  for j in numba.prange(n):\n",
      "/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/numba/np/ufunc/parallel.py:371: NumbaWarning: The TBB threading layer requires TBB version 2021 update 6 or later i.e., TBB_INTERFACE_VERSION >= 12060. Found TBB_INTERFACE_VERSION = 12050. The TBB threading layer is disabled.\n",
      "  warnings.warn(problem)\n"
     ]
    }
   ],
   "source": [
    "from typing import Optional\n",
    "from dance.modules.base import BaseClassificationMethod\n",
    "from sklearn.svm import SVC\n",
    "import numpy as np\n",
    "\n",
    "from dance.transforms.cell_feature import WeightedFeaturePCA\n",
    "from dance.transforms.misc import Compose, SetConfig\n",
    "from dance.typing import LogLevel\n",
    "\n",
    "class SVM(BaseClassificationMethod):\n",
    "    \"\"\"The SVM cell-type classification model.\n",
    "\n",
    "    Parameters\n",
    "    ----------\n",
    "    args : argparse.Namespace\n",
    "        A Namespace contains arguments of SVM. See parser help document for more info.\n",
    "    prj_path: str\n",
    "        project path\n",
    "\n",
    "    \"\"\"\n",
    "\n",
    "    def __init__(self, args, prj_path=\"./\", random_state: Optional[int] = None):\n",
    "        self.args = args\n",
    "        self.random_state = random_state\n",
    "        self._mdl = SVC(random_state=random_state, probability=True)\n",
    "\n",
    "    @staticmethod\n",
    "    def preprocessing_pipeline(n_components: int = 400, log_level: LogLevel = \"INFO\"):\n",
    "        return Compose(\n",
    "            WeightedFeaturePCA(n_components=n_components, split_name=\"train\"),\n",
    "            SetConfig({\n",
    "                \"feature_channel\": \"WeightedFeaturePCA\",\n",
    "                \"label_channel\": \"cell_type\"\n",
    "            }),\n",
    "            log_level=log_level,\n",
    "        )\n",
    "\n",
    "    def fit(self, x: np.ndarray, y: np.ndarray):\n",
    "        \"\"\"Train the classifier.\n",
    "\n",
    "        Parameters\n",
    "        ----------\n",
    "        x\n",
    "            Training cell features.\n",
    "        y\n",
    "            Training labels.\n",
    "\n",
    "        \"\"\"\n",
    "        self._mdl.fit(x, y)\n",
    "\n",
    "    def predict(self, x: np.ndarray):\n",
    "        \"\"\"Predict cell labels.\n",
    "\n",
    "        Parameters\n",
    "        ----------\n",
    "        x\n",
    "            Samples to be predicted (samplex x features).\n",
    "\n",
    "        Returns\n",
    "        -------\n",
    "        y\n",
    "            Predicted labels of the input samples.\n",
    "\n",
    "        \"\"\"\n",
    "        return self._mdl.predict(x)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. Example `main.py` File\n",
    "\n",
    "Below is an example of how your `main.py` might look if you're adding SVM as one of the classification methods. This file orchestrates the entire pipeline:\n",
    "\n",
    "1. **Register** preprocessing functions through annotations (optional)\n",
    "2. **Parsing Arguments** and configuring hyperparameters.  \n",
    "3. **Defining** an evaluation function that:  \n",
    "   - Loads and preprocesses the data.  \n",
    "   - Initializes your model (the new SVM class).  \n",
    "   - Trains and scores the model.  \n",
    "   - Logs results to Weights & Biases (wandb).  \n",
    "4. **Running** the hyperparameter sweep agent (e.g., via `wandb_sweep_agent`).  \n",
    "5. **Saving** results and optionally generating a second-stage tuning config file.\n",
    "\n",
    "> **Note**: For demonstration, only relevant code is shown. Adjust as needed for your exact pipeline or data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "\"\"\" \n",
    "Step 1: preprocessing functions can be registered using register_preprocessor. \n",
    "In this example, the GaussRandProjFeature preprocessing function is registered within the feature.cell pipeline. \n",
    "This registered function can later be specified in the configuration file.\n",
    "\"\"\"\n",
    "from sklearn.random_projection import GaussianRandomProjection\n",
    "from dance.registry import register_preprocessor\n",
    "from dance.transforms.base import BaseTransform\n",
    "\n",
    "\n",
    "@register_preprocessor(\"feature\", \"cell\",overwrite=True)  # NOTE: register any custom preprocessing function to be used for tuning\n",
    "class GaussRandProjFeature(BaseTransform):\n",
    "    \"\"\"Custom preprocessing to extract cell feature via Gaussian random projection.\"\"\"\n",
    "\n",
    "    _DISPLAY_ATTRS = (\"n_components\", \"eps\")\n",
    "\n",
    "    def __init__(self, n_components: int = 400, eps: float = 0.1, **kwargs):\n",
    "        super().__init__(**kwargs)\n",
    "        self.n_components = n_components\n",
    "        self.eps = eps\n",
    "\n",
    "    def __call__(self, data):\n",
    "        feat = data.get_feature(return_type=\"numpy\")\n",
    "        grp = GaussianRandomProjection(n_components=self.n_components, eps=self.eps)\n",
    "\n",
    "        self.logger.info(f\"Start generateing cell feature via Gaussian random projection (d={self.n_components}).\")\n",
    "        data.data.obsm[self.out] = grp.fit_transform(feat)\n",
    "\n",
    "        return data\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "[INFO][2025-08-20 12:25:15,393][dance][main] \n",
      " files is saved in /home/zyxing/dance/examples/tuning/custom-methods/328_138\n",
      "[INFO][2025-08-20 12:25:15,411][dance][config] tune mode is set to pipeline_params, tune_mode will first be converted to pipeline\n",
      "Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.\n",
      "[INFO][2025-08-20 12:25:17,271][dance][wandb_sweep] \u001b[94m\n",
      "\n",
      "\t[*] Sweep ID: 15layk3y\n",
      "\u001b[0m\n",
      "[INFO][2025-08-20 12:25:17,272][dance][wandb_sweep_agent] Spawning agent: sweep_id='15layk3y', entity='xzy11632', project='dance-dev', count=2\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Create sweep with ID: 15layk3y\n",
      "Sweep URL: https://wandb.ai/xzy11632/dance-dev/sweeps/15layk3y\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\u001b[34m\u001b[1mwandb\u001b[0m: Agent Starting Run: dduxrl3d with config:\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \tpipeline.0.filter.gene: FilterGenesPercentile\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \tpipeline.1.normalize: ColumnSumNormalize\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \tpipeline.2.filter.gene: FilterGenesRegression\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \tpipeline.3.feature.cell: CellPCA\n",
      "Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: Currently logged in as: \u001b[33mxzy11632\u001b[0m. Use \u001b[1m`wandb login --relogin`\u001b[0m to force relogin\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "wandb version 0.21.1 is available!  To upgrade, please run:\n",
       " $ pip install wandb --upgrade"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "Tracking run with wandb version 0.16.3"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "Run data is saved locally in <code>/home/zyxing/dance/examples/tuning/custom-methods/wandb/run-20250820_122520-dduxrl3d</code>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "Syncing run <strong><a href='https://wandb.ai/xzy11632/dance-dev/runs/dduxrl3d' target=\"_blank\">rural-sweep-1</a></strong> to <a href='https://wandb.ai/xzy11632/dance-dev' target=\"_blank\">Weights & Biases</a> (<a href='https://wandb.me/run' target=\"_blank\">docs</a>)<br/>Sweep page: <a href='https://wandb.ai/xzy11632/dance-dev/sweeps/15layk3y' target=\"_blank\">https://wandb.ai/xzy11632/dance-dev/sweeps/15layk3y</a>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       " View project at <a href='https://wandb.ai/xzy11632/dance-dev' target=\"_blank\">https://wandb.ai/xzy11632/dance-dev</a>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       " View sweep at <a href='https://wandb.ai/xzy11632/dance-dev/sweeps/15layk3y' target=\"_blank\">https://wandb.ai/xzy11632/dance-dev/sweeps/15layk3y</a>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       " View run at <a href='https://wandb.ai/xzy11632/dance-dev/runs/dduxrl3d' target=\"_blank\">https://wandb.ai/xzy11632/dance-dev/runs/dduxrl3d</a>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "[INFO][2025-08-20 12:25:32,694][dance][set_seed] Setting global random seed to 10\n",
      "[INFO][2025-08-20 12:25:32,696][dance][_load_dfs] Loading data from ../temp_data/train/human/human_Brain328_data.csv\n",
      "[INFO][2025-08-20 12:25:32,983][dance][_load_dfs] Loading data from ../temp_data/test/human/human_Brain138_data.csv\n",
      "[INFO][2025-08-20 12:25:33,104][dance][_load_dfs] Loading data from ../temp_data/train/human/human_Brain328_celltype.csv\n",
      "[INFO][2025-08-20 12:25:33,107][dance][_load_dfs] Loading data from ../temp_data/test/human/human_Brain138_celltype.csv\n",
      "/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/anndata/_core/anndata.py:430: FutureWarning: The dtype argument is deprecated and will be removed in late 2024.\n",
      "  warnings.warn(\n",
      "[INFO][2025-08-20 12:25:33,354][dance][_load_raw_data] Loaded expression data: AnnData object with n_obs × n_vars = 466 × 22088\n",
      "[INFO][2025-08-20 12:25:33,355][dance][_load_raw_data] Number of training samples: 262\n",
      "[INFO][2025-08-20 12:25:33,356][dance][_load_raw_data] Number of valid samples: 66\n",
      "[INFO][2025-08-20 12:25:33,357][dance][_load_raw_data] Number of testing samples: 138\n",
      "[INFO][2025-08-20 12:25:33,357][dance][_load_raw_data] Cell-types (n=9):\n",
      "['OPC',\n",
      " 'astrocytes',\n",
      " 'endothelial',\n",
      " 'fetal_quiescent',\n",
      " 'fetal_replicating',\n",
      " 'hybrid',\n",
      " 'microglia',\n",
      " 'neurons',\n",
      " 'oligodendrocytes']\n",
      "[INFO][2025-08-20 12:25:33,360][dance][load_data] Raw data loaded:\n",
      "Data object that wraps (.data):\n",
      "AnnData object with n_obs × n_vars = 466 × 22088\n",
      "    uns: 'dance_config'\n",
      "    obsm: 'cell_type'\n",
      "[INFO][2025-08-20 12:25:33,361][dance][wrapped_func] Took 0:00:00.665608 to load and process data.\n",
      "[INFO][2025-08-20 12:25:33,361][dance][generate_config] The content in pipeline_params will be converted to pipeline\n",
      "[INFO][2025-08-20 12:25:33,363][dance][_sanitize_pipeline] Pipeline plan:\n",
      "\u001b[92m['FilterGenesPercentile',\n",
      " 'ColumnSumNormalize',\n",
      " 'FilterGenesRegression',\n",
      " 'CellPCA',\n",
      " None]\u001b[0m\n",
      "[WARNING][2025-08-20 12:25:33,370][dance.FilterGenesPercentile][__call__] n_counts will be added to the var of data\n",
      "[WARNING][2025-08-20 12:25:33,376][dance.FilterGenesPercentile][__call__] n_cells will be added to the var of data\n",
      "/home/zyxing/dance/dance/transforms/filter.py:801: UserWarning: Expecting count data as input, but the input feature matrix does not appear to be count.Please make sure the input is indeed a count matrix.\n",
      "  warnings.warn(\"Expecting count data as input, but the input feature matrix does not appear to be count.\"\n",
      "[INFO][2025-08-20 12:25:33,618][dance][_filter_enclasc] Start generating cell features using EnClaSC\n",
      "[WARNING][2025-08-20 12:25:33,641][dance.CellPCA][__call__] n_components=400 must be between 0 and min(n_samples, n_features)=100 with svd_solver='auto'\n",
      "[INFO][2025-08-20 12:25:33,652][dance.CellPCA][__call__] Generating cell PCA features (466, 100) (k=100)\n",
      "[INFO][2025-08-20 12:25:33,654][dance.CellPCA][__call__] Top 10 explained variances: [0.11390967 0.07235937 0.05432951 0.04682069 0.04452541 0.03371136\n",
      " 0.02961524 0.0270735  0.025507   0.02293849]\n",
      "[INFO][2025-08-20 12:25:33,655][dance.CellPCA][__call__] Total explained variance: 100.00%\n",
      "[INFO][2025-08-20 12:25:33,657][dance][set_config_from_dict] Setting config 'feature_channel' to 'feature.cell'\n",
      "[INFO][2025-08-20 12:25:33,658][dance][set_config_from_dict] Setting config 'label_channel' to 'cell_type'\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "e6ac2024dd3940f589dfb94f09bd833e",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "VBox(children=(Label(value='0.011 MB of 0.011 MB uploaded\\r'), FloatProgress(value=1.0, max=1.0)))"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<style>\n",
       "    table.wandb td:nth-child(1) { padding: 0 10px; text-align: left ; width: auto;} td:nth-child(2) {text-align: left ; width: 100%}\n",
       "    .wandb-row { display: flex; flex-direction: row; flex-wrap: wrap; justify-content: flex-start; width: 100% }\n",
       "    .wandb-col { display: flex; flex-direction: column; flex-basis: 100%; flex: 1; padding: 10px; }\n",
       "    </style>\n",
       "<div class=\"wandb-row\"><div class=\"wandb-col\"><h3>Run history:</h3><br/><table class=\"wandb\"><tr><td>acc</td><td>▁</td></tr><tr><td>test_acc</td><td>▁</td></tr><tr><td>train_acc</td><td>▁</td></tr></table><br/></div><div class=\"wandb-col\"><h3>Run summary:</h3><br/><table class=\"wandb\"><tr><td>acc</td><td>0.51515</td></tr><tr><td>test_acc</td><td>0.10145</td></tr><tr><td>train_acc</td><td>0.65649</td></tr></table><br/></div></div>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       " View run <strong style=\"color:#cdcd00\">rural-sweep-1</strong> at: <a href='https://wandb.ai/xzy11632/dance-dev/runs/dduxrl3d' target=\"_blank\">https://wandb.ai/xzy11632/dance-dev/runs/dduxrl3d</a><br/>Synced 6 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "Find logs at: <code>./wandb/run-20250820_122520-dduxrl3d/logs</code>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\u001b[34m\u001b[1mwandb\u001b[0m: Agent Starting Run: nxe1pfd6 with config:\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \tpipeline.0.filter.gene: FilterGenesPercentile\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \tpipeline.1.normalize: ColumnSumNormalize\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \tpipeline.2.filter.gene: FilterGenesRegression\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \tpipeline.3.feature.cell: CellSVD\n",
      "Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "wandb version 0.21.1 is available!  To upgrade, please run:\n",
       " $ pip install wandb --upgrade"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "Tracking run with wandb version 0.16.3"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "Run data is saved locally in <code>/home/zyxing/dance/examples/tuning/custom-methods/wandb/run-20250820_122547-nxe1pfd6</code>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "Syncing run <strong><a href='https://wandb.ai/xzy11632/dance-dev/runs/nxe1pfd6' target=\"_blank\">olive-sweep-2</a></strong> to <a href='https://wandb.ai/xzy11632/dance-dev' target=\"_blank\">Weights & Biases</a> (<a href='https://wandb.me/run' target=\"_blank\">docs</a>)<br/>Sweep page: <a href='https://wandb.ai/xzy11632/dance-dev/sweeps/15layk3y' target=\"_blank\">https://wandb.ai/xzy11632/dance-dev/sweeps/15layk3y</a>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       " View project at <a href='https://wandb.ai/xzy11632/dance-dev' target=\"_blank\">https://wandb.ai/xzy11632/dance-dev</a>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       " View sweep at <a href='https://wandb.ai/xzy11632/dance-dev/sweeps/15layk3y' target=\"_blank\">https://wandb.ai/xzy11632/dance-dev/sweeps/15layk3y</a>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       " View run at <a href='https://wandb.ai/xzy11632/dance-dev/runs/nxe1pfd6' target=\"_blank\">https://wandb.ai/xzy11632/dance-dev/runs/nxe1pfd6</a>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "[INFO][2025-08-20 12:25:58,365][dance][set_seed] Setting global random seed to 10\n",
      "[INFO][2025-08-20 12:25:58,368][dance][_load_dfs] Loading data from ../temp_data/train/human/human_Brain328_data.csv\n",
      "[INFO][2025-08-20 12:25:58,770][dance][_load_dfs] Loading data from ../temp_data/test/human/human_Brain138_data.csv\n",
      "[INFO][2025-08-20 12:25:58,914][dance][_load_dfs] Loading data from ../temp_data/train/human/human_Brain328_celltype.csv\n",
      "[INFO][2025-08-20 12:25:58,919][dance][_load_dfs] Loading data from ../temp_data/test/human/human_Brain138_celltype.csv\n",
      "/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/anndata/_core/anndata.py:430: FutureWarning: The dtype argument is deprecated and will be removed in late 2024.\n",
      "  warnings.warn(\n",
      "[INFO][2025-08-20 12:25:59,103][dance][_load_raw_data] Loaded expression data: AnnData object with n_obs × n_vars = 466 × 22088\n",
      "[INFO][2025-08-20 12:25:59,104][dance][_load_raw_data] Number of training samples: 262\n",
      "[INFO][2025-08-20 12:25:59,105][dance][_load_raw_data] Number of valid samples: 66\n",
      "[INFO][2025-08-20 12:25:59,106][dance][_load_raw_data] Number of testing samples: 138\n",
      "[INFO][2025-08-20 12:25:59,107][dance][_load_raw_data] Cell-types (n=9):\n",
      "['OPC',\n",
      " 'astrocytes',\n",
      " 'endothelial',\n",
      " 'fetal_quiescent',\n",
      " 'fetal_replicating',\n",
      " 'hybrid',\n",
      " 'microglia',\n",
      " 'neurons',\n",
      " 'oligodendrocytes']\n",
      "[INFO][2025-08-20 12:25:59,110][dance][load_data] Raw data loaded:\n",
      "Data object that wraps (.data):\n",
      "AnnData object with n_obs × n_vars = 466 × 22088\n",
      "    uns: 'dance_config'\n",
      "    obsm: 'cell_type'\n",
      "[INFO][2025-08-20 12:25:59,111][dance][wrapped_func] Took 0:00:00.743655 to load and process data.\n",
      "[INFO][2025-08-20 12:25:59,111][dance][generate_config] The content in pipeline_params will be converted to pipeline\n",
      "[INFO][2025-08-20 12:25:59,114][dance][_sanitize_pipeline] Pipeline plan:\n",
      "\u001b[92m['FilterGenesPercentile',\n",
      " 'ColumnSumNormalize',\n",
      " 'FilterGenesRegression',\n",
      " 'CellSVD',\n",
      " None]\u001b[0m\n",
      "[WARNING][2025-08-20 12:25:59,121][dance.FilterGenesPercentile][__call__] n_counts will be added to the var of data\n",
      "[WARNING][2025-08-20 12:25:59,127][dance.FilterGenesPercentile][__call__] n_cells will be added to the var of data\n",
      "/home/zyxing/dance/dance/transforms/filter.py:801: UserWarning: Expecting count data as input, but the input feature matrix does not appear to be count.Please make sure the input is indeed a count matrix.\n",
      "  warnings.warn(\"Expecting count data as input, but the input feature matrix does not appear to be count.\"\n",
      "[INFO][2025-08-20 12:25:59,320][dance][_filter_enclasc] Start generating cell features using EnClaSC\n",
      "[WARNING][2025-08-20 12:25:59,340][dance.CellSVD][__call__] n_components=400 must be between 0 and min(n_samples, n_features)=100 with svd_solver='full'\n",
      "[INFO][2025-08-20 12:25:59,388][dance.CellSVD][__call__] Generating cell SVD features (466, 100) (k=100)\n",
      "[INFO][2025-08-20 12:25:59,389][dance.CellSVD][__call__] Top 10 explained variances: [0.0475235  0.10532387 0.06999493 0.05225454 0.04628773 0.03516576\n",
      " 0.03369351 0.02756703 0.02625577 0.02297754]\n",
      "[INFO][2025-08-20 12:25:59,390][dance.CellSVD][__call__] Total explained variance: 100.00%\n",
      "[INFO][2025-08-20 12:25:59,391][dance][set_config_from_dict] Setting config 'feature_channel' to 'feature.cell'\n",
      "[INFO][2025-08-20 12:25:59,392][dance][set_config_from_dict] Setting config 'label_channel' to 'cell_type'\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "f1bbc28a9b174491931df5bffdad81bf",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "VBox(children=(Label(value='0.011 MB of 0.011 MB uploaded\\r'), FloatProgress(value=1.0, max=1.0)))"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<style>\n",
       "    table.wandb td:nth-child(1) { padding: 0 10px; text-align: left ; width: auto;} td:nth-child(2) {text-align: left ; width: 100%}\n",
       "    .wandb-row { display: flex; flex-direction: row; flex-wrap: wrap; justify-content: flex-start; width: 100% }\n",
       "    .wandb-col { display: flex; flex-direction: column; flex-basis: 100%; flex: 1; padding: 10px; }\n",
       "    </style>\n",
       "<div class=\"wandb-row\"><div class=\"wandb-col\"><h3>Run history:</h3><br/><table class=\"wandb\"><tr><td>acc</td><td>▁</td></tr><tr><td>test_acc</td><td>▁</td></tr><tr><td>train_acc</td><td>▁</td></tr></table><br/></div><div class=\"wandb-col\"><h3>Run summary:</h3><br/><table class=\"wandb\"><tr><td>acc</td><td>0.5</td></tr><tr><td>test_acc</td><td>0.0942</td></tr><tr><td>train_acc</td><td>0.64122</td></tr></table><br/></div></div>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       " View run <strong style=\"color:#cdcd00\">olive-sweep-2</strong> at: <a href='https://wandb.ai/xzy11632/dance-dev/runs/nxe1pfd6' target=\"_blank\">https://wandb.ai/xzy11632/dance-dev/runs/nxe1pfd6</a><br/>Synced 6 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "Find logs at: <code>./wandb/run-20250820_122547-nxe1pfd6/logs</code>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "[INFO][2025-08-20 12:26:09,505][dance][wandb_sweep] \u001b[94m\n",
      "\n",
      "\t[*] Sweep ID: 1f9pschy\n",
      "\u001b[0m\n",
      "[INFO][2025-08-20 12:26:09,506][dance][wandb_sweep_agent] Spawning agent: sweep_id='1f9pschy', entity='xzy11632', project='dance-dev', count=2\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Create sweep with ID: 1f9pschy\n",
      "Sweep URL: https://wandb.ai/xzy11632/dance-dev/sweeps/1f9pschy\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\u001b[34m\u001b[1mwandb\u001b[0m: Agent Starting Run: 69ew4oa2 with config:\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \tparams.0.FilterGenesPercentile.max_val: 98\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \tparams.0.FilterGenesPercentile.min_val: 8\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \tparams.0.FilterGenesPercentile.mode: rv\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \tparams.1.ColumnSumNormalize.eps: 0.7\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \tparams.1.ColumnSumNormalize.mode: minmax\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \tparams.2.FilterGenesRegression.method: scmap\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \tparams.2.FilterGenesRegression.num_genes: 5388\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \tparams.3.CellPCA.n_components: 227\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \tparams.3.CellPCA.svd_solver: arpack\n",
      "Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "wandb version 0.21.1 is available!  To upgrade, please run:\n",
       " $ pip install wandb --upgrade"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "Tracking run with wandb version 0.16.3"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "Run data is saved locally in <code>/home/zyxing/dance/examples/tuning/custom-methods/wandb/run-20250820_122613-69ew4oa2</code>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "Syncing run <strong><a href='https://wandb.ai/xzy11632/dance-dev/runs/69ew4oa2' target=\"_blank\">morning-sweep-1</a></strong> to <a href='https://wandb.ai/xzy11632/dance-dev' target=\"_blank\">Weights & Biases</a> (<a href='https://wandb.me/run' target=\"_blank\">docs</a>)<br/>Sweep page: <a href='https://wandb.ai/xzy11632/dance-dev/sweeps/1f9pschy' target=\"_blank\">https://wandb.ai/xzy11632/dance-dev/sweeps/1f9pschy</a>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       " View project at <a href='https://wandb.ai/xzy11632/dance-dev' target=\"_blank\">https://wandb.ai/xzy11632/dance-dev</a>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       " View sweep at <a href='https://wandb.ai/xzy11632/dance-dev/sweeps/1f9pschy' target=\"_blank\">https://wandb.ai/xzy11632/dance-dev/sweeps/1f9pschy</a>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       " View run at <a href='https://wandb.ai/xzy11632/dance-dev/runs/69ew4oa2' target=\"_blank\">https://wandb.ai/xzy11632/dance-dev/runs/69ew4oa2</a>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "[INFO][2025-08-20 12:26:24,459][dance][set_seed] Setting global random seed to 10\n",
      "[INFO][2025-08-20 12:26:24,461][dance][_load_dfs] Loading data from ../temp_data/train/human/human_Brain328_data.csv\n",
      "[INFO][2025-08-20 12:26:24,854][dance][_load_dfs] Loading data from ../temp_data/test/human/human_Brain138_data.csv\n",
      "[INFO][2025-08-20 12:26:24,996][dance][_load_dfs] Loading data from ../temp_data/train/human/human_Brain328_celltype.csv\n",
      "[INFO][2025-08-20 12:26:25,000][dance][_load_dfs] Loading data from ../temp_data/test/human/human_Brain138_celltype.csv\n",
      "/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/anndata/_core/anndata.py:430: FutureWarning: The dtype argument is deprecated and will be removed in late 2024.\n",
      "  warnings.warn(\n",
      "[INFO][2025-08-20 12:26:25,171][dance][_load_raw_data] Loaded expression data: AnnData object with n_obs × n_vars = 466 × 22088\n",
      "[INFO][2025-08-20 12:26:25,173][dance][_load_raw_data] Number of training samples: 262\n",
      "[INFO][2025-08-20 12:26:25,173][dance][_load_raw_data] Number of valid samples: 66\n",
      "[INFO][2025-08-20 12:26:25,175][dance][_load_raw_data] Number of testing samples: 138\n",
      "[INFO][2025-08-20 12:26:25,175][dance][_load_raw_data] Cell-types (n=9):\n",
      "['OPC',\n",
      " 'astrocytes',\n",
      " 'endothelial',\n",
      " 'fetal_quiescent',\n",
      " 'fetal_replicating',\n",
      " 'hybrid',\n",
      " 'microglia',\n",
      " 'neurons',\n",
      " 'oligodendrocytes']\n",
      "[INFO][2025-08-20 12:26:25,179][dance][load_data] Raw data loaded:\n",
      "Data object that wraps (.data):\n",
      "AnnData object with n_obs × n_vars = 466 × 22088\n",
      "    uns: 'dance_config'\n",
      "    obsm: 'cell_type'\n",
      "[INFO][2025-08-20 12:26:25,180][dance][wrapped_func] Took 0:00:00.720475 to load and process data.\n",
      "[INFO][2025-08-20 12:26:25,182][dance][_sanitize_params] Params plan:\n",
      "\u001b[92m[{'max_val': 98, 'min_val': 8, 'mode': 'rv'},\n",
      " {'eps': 0.7, 'mode': 'minmax'},\n",
      " {'method': 'scmap', 'num_genes': 5388},\n",
      " {'n_components': 227, 'svd_solver': 'arpack'},\n",
      " None]\u001b[0m\n",
      "[WARNING][2025-08-20 12:26:25,193][dance.FilterGenesPercentile][__call__] n_counts will be added to the var of data\n",
      "[WARNING][2025-08-20 12:26:25,200][dance.FilterGenesPercentile][__call__] n_cells will be added to the var of data\n",
      "/home/zyxing/dance/dance/transforms/filter.py:490: RuntimeWarning: invalid value encountered in divide\n",
      "  gene_summary = np.nan_to_num(np.array(x.var(0) / x.mean(0)), posinf=0, neginf=0).ravel()\n",
      "/home/zyxing/dance/dance/transforms/filter.py:801: UserWarning: Expecting count data as input, but the input feature matrix does not appear to be count.Please make sure the input is indeed a count matrix.\n",
      "  warnings.warn(\"Expecting count data as input, but the input feature matrix does not appear to be count.\"\n",
      "[INFO][2025-08-20 12:26:25,452][dance][_filter_scmap] Start generating cell features using scmap\n",
      "[INFO][2025-08-20 12:26:26,328][dance.CellPCA][__call__] Generating cell PCA features (466, 5388) (k=227)\n",
      "[INFO][2025-08-20 12:26:26,330][dance.CellPCA][__call__] Top 10 explained variances: [0.05885801 0.0215941  0.01331656 0.01141509 0.00957707 0.00813536\n",
      " 0.00765066 0.00701284 0.00689292 0.00644914]\n",
      "[INFO][2025-08-20 12:26:26,331][dance.CellPCA][__call__] Total explained variance: 76.84%\n",
      "[INFO][2025-08-20 12:26:26,332][dance][set_config_from_dict] Setting config 'feature_channel' to 'feature.cell'\n",
      "[INFO][2025-08-20 12:26:26,333][dance][set_config_from_dict] Setting config 'label_channel' to 'cell_type'\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "5e22aa93f5844c9a8e659c22be9c608a",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "VBox(children=(Label(value='0.011 MB of 0.011 MB uploaded\\r'), FloatProgress(value=1.0, max=1.0)))"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<style>\n",
       "    table.wandb td:nth-child(1) { padding: 0 10px; text-align: left ; width: auto;} td:nth-child(2) {text-align: left ; width: 100%}\n",
       "    .wandb-row { display: flex; flex-direction: row; flex-wrap: wrap; justify-content: flex-start; width: 100% }\n",
       "    .wandb-col { display: flex; flex-direction: column; flex-basis: 100%; flex: 1; padding: 10px; }\n",
       "    </style>\n",
       "<div class=\"wandb-row\"><div class=\"wandb-col\"><h3>Run history:</h3><br/><table class=\"wandb\"><tr><td>acc</td><td>▁</td></tr><tr><td>test_acc</td><td>▁</td></tr><tr><td>train_acc</td><td>▁</td></tr></table><br/></div><div class=\"wandb-col\"><h3>Run summary:</h3><br/><table class=\"wandb\"><tr><td>acc</td><td>0.71212</td></tr><tr><td>test_acc</td><td>0.36232</td></tr><tr><td>train_acc</td><td>0.89695</td></tr></table><br/></div></div>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       " View run <strong style=\"color:#cdcd00\">morning-sweep-1</strong> at: <a href='https://wandb.ai/xzy11632/dance-dev/runs/69ew4oa2' target=\"_blank\">https://wandb.ai/xzy11632/dance-dev/runs/69ew4oa2</a><br/>Synced 6 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "Find logs at: <code>./wandb/run-20250820_122613-69ew4oa2/logs</code>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\u001b[34m\u001b[1mwandb\u001b[0m: Agent Starting Run: h8j0oo74 with config:\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \tparams.0.FilterGenesPercentile.max_val: 96\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \tparams.0.FilterGenesPercentile.min_val: 1\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \tparams.0.FilterGenesPercentile.mode: cv\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \tparams.1.ColumnSumNormalize.eps: 0.3\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \tparams.1.ColumnSumNormalize.mode: standardize\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \tparams.2.FilterGenesRegression.method: seurat3\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \tparams.2.FilterGenesRegression.num_genes: 9419\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \tparams.3.CellPCA.n_components: 636\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \tparams.3.CellPCA.svd_solver: arpack\n",
      "Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "wandb version 0.21.1 is available!  To upgrade, please run:\n",
       " $ pip install wandb --upgrade"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "Tracking run with wandb version 0.16.3"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "Run data is saved locally in <code>/home/zyxing/dance/examples/tuning/custom-methods/wandb/run-20250820_122640-h8j0oo74</code>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "Syncing run <strong><a href='https://wandb.ai/xzy11632/dance-dev/runs/h8j0oo74' target=\"_blank\">earnest-sweep-2</a></strong> to <a href='https://wandb.ai/xzy11632/dance-dev' target=\"_blank\">Weights & Biases</a> (<a href='https://wandb.me/run' target=\"_blank\">docs</a>)<br/>Sweep page: <a href='https://wandb.ai/xzy11632/dance-dev/sweeps/1f9pschy' target=\"_blank\">https://wandb.ai/xzy11632/dance-dev/sweeps/1f9pschy</a>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       " View project at <a href='https://wandb.ai/xzy11632/dance-dev' target=\"_blank\">https://wandb.ai/xzy11632/dance-dev</a>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       " View sweep at <a href='https://wandb.ai/xzy11632/dance-dev/sweeps/1f9pschy' target=\"_blank\">https://wandb.ai/xzy11632/dance-dev/sweeps/1f9pschy</a>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       " View run at <a href='https://wandb.ai/xzy11632/dance-dev/runs/h8j0oo74' target=\"_blank\">https://wandb.ai/xzy11632/dance-dev/runs/h8j0oo74</a>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "[INFO][2025-08-20 12:26:51,507][dance][set_seed] Setting global random seed to 10\n",
      "[INFO][2025-08-20 12:26:51,512][dance][_load_dfs] Loading data from ../temp_data/train/human/human_Brain328_data.csv\n",
      "[INFO][2025-08-20 12:26:51,877][dance][_load_dfs] Loading data from ../temp_data/test/human/human_Brain138_data.csv\n",
      "[INFO][2025-08-20 12:26:52,016][dance][_load_dfs] Loading data from ../temp_data/train/human/human_Brain328_celltype.csv\n",
      "[INFO][2025-08-20 12:26:52,020][dance][_load_dfs] Loading data from ../temp_data/test/human/human_Brain138_celltype.csv\n",
      "/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/anndata/_core/anndata.py:430: FutureWarning: The dtype argument is deprecated and will be removed in late 2024.\n",
      "  warnings.warn(\n",
      "[INFO][2025-08-20 12:26:52,175][dance][_load_raw_data] Loaded expression data: AnnData object with n_obs × n_vars = 466 × 22088\n",
      "[INFO][2025-08-20 12:26:52,177][dance][_load_raw_data] Number of training samples: 262\n",
      "[INFO][2025-08-20 12:26:52,177][dance][_load_raw_data] Number of valid samples: 66\n",
      "[INFO][2025-08-20 12:26:52,178][dance][_load_raw_data] Number of testing samples: 138\n",
      "[INFO][2025-08-20 12:26:52,179][dance][_load_raw_data] Cell-types (n=9):\n",
      "['OPC',\n",
      " 'astrocytes',\n",
      " 'endothelial',\n",
      " 'fetal_quiescent',\n",
      " 'fetal_replicating',\n",
      " 'hybrid',\n",
      " 'microglia',\n",
      " 'neurons',\n",
      " 'oligodendrocytes']\n",
      "[INFO][2025-08-20 12:26:52,181][dance][load_data] Raw data loaded:\n",
      "Data object that wraps (.data):\n",
      "AnnData object with n_obs × n_vars = 466 × 22088\n",
      "    uns: 'dance_config'\n",
      "    obsm: 'cell_type'\n",
      "[INFO][2025-08-20 12:26:52,182][dance][wrapped_func] Took 0:00:00.670366 to load and process data.\n",
      "[INFO][2025-08-20 12:26:52,183][dance][_sanitize_params] Params plan:\n",
      "\u001b[92m[{'max_val': 96, 'min_val': 1, 'mode': 'cv'},\n",
      " {'eps': 0.3, 'mode': 'standardize'},\n",
      " {'method': 'seurat3', 'num_genes': 9419},\n",
      " {'n_components': 636, 'svd_solver': 'arpack'},\n",
      " None]\u001b[0m\n",
      "[WARNING][2025-08-20 12:26:52,190][dance.FilterGenesPercentile][__call__] n_counts will be added to the var of data\n",
      "[WARNING][2025-08-20 12:26:52,195][dance.FilterGenesPercentile][__call__] n_cells will be added to the var of data\n",
      "/home/zyxing/dance/dance/transforms/filter.py:488: RuntimeWarning: invalid value encountered in divide\n",
      "  gene_summary = np.nan_to_num(np.array(x.std(0) / x.mean(0)), posinf=0, neginf=0).ravel()\n",
      "/home/zyxing/dance/dance/transforms/filter.py:801: UserWarning: Expecting count data as input, but the input feature matrix does not appear to be count.Please make sure the input is indeed a count matrix.\n",
      "  warnings.warn(\"Expecting count data as input, but the input feature matrix does not appear to be count.\"\n",
      "[INFO][2025-08-20 12:26:52,411][dance][_filter_seurat3] Start generating cell features using Seurat v3.0\n",
      "[WARNING][2025-08-20 12:26:52,460][dance.CellPCA][__call__] n_components=636 must be between 0 and min(n_samples, n_features)=466 with svd_solver='arpack'\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "0df7caa8625e48b3abad4562f20e1e9d",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "VBox(children=(Label(value='0.011 MB of 0.011 MB uploaded\\r'), FloatProgress(value=1.0, max=1.0)))"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       " View run <strong style=\"color:#cdcd00\">earnest-sweep-2</strong> at: <a href='https://wandb.ai/xzy11632/dance-dev/runs/h8j0oo74' target=\"_blank\">https://wandb.ai/xzy11632/dance-dev/runs/h8j0oo74</a><br/>Synced 6 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "Find logs at: <code>./wandb/run-20250820_122640-h8j0oo74/logs</code>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Run h8j0oo74 errored:\n",
      "Traceback (most recent call last):\n",
      "  File \"/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/wandb/agents/pyagent.py\", line 308, in _run_job\n",
      "    self._function()\n",
      "  File \"/tmp/ipykernel_715844/3979844991.py\", line 88, in evaluate_pipeline\n",
      "    preprocessing_pipeline(data)\n",
      "  File \"/home/zyxing/dance/dance/pipeline.py\", line 128, in __call__\n",
      "    return self.functional(*args, **kwargs)\n",
      "           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n",
      "  File \"/home/zyxing/dance/dance/pipeline.py\", line 247, in bounded_functional\n",
      "    a(*args, **kwargs)\n",
      "  File \"/home/zyxing/dance/dance/pipeline.py\", line 128, in __call__\n",
      "    return self.functional(*args, **kwargs)\n",
      "           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n",
      "  File \"/home/zyxing/dance/dance/utils/wrappers.py\", line 128, in new_call\n",
      "    return original_call(self, data, *args, **kwargs)\n",
      "           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n",
      "  File \"/home/zyxing/dance/dance/transforms/cell_feature.py\", line 177, in __call__\n",
      "    cell_feat = pca.fit_transform(feat)\n",
      "                ^^^^^^^^^^^^^^^^^^^^^^^\n",
      "  File \"/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/sklearn/utils/_set_output.py\", line 157, in wrapped\n",
      "    data_to_wrap = f(self, X, *args, **kwargs)\n",
      "                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^\n",
      "  File \"/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/sklearn/base.py\", line 1152, in wrapper\n",
      "    return fit_method(estimator, *args, **kwargs)\n",
      "           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n",
      "  File \"/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/sklearn/decomposition/_pca.py\", line 460, in fit_transform\n",
      "    U, S, Vt = self._fit(X)\n",
      "               ^^^^^^^^^^^^\n",
      "  File \"/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/sklearn/decomposition/_pca.py\", line 512, in _fit\n",
      "    return self._fit_truncated(X, n_components, self._fit_svd_solver)\n",
      "           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n",
      "  File \"/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/sklearn/decomposition/_pca.py\", line 592, in _fit_truncated\n",
      "    raise ValueError(\n",
      "ValueError: n_components=466 must be strictly less than min(n_samples, n_features)=466 with svd_solver='arpack'\n",
      "\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m Run h8j0oo74 errored:\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m Traceback (most recent call last):\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m   File \"/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/wandb/agents/pyagent.py\", line 308, in _run_job\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m     self._function()\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m   File \"/tmp/ipykernel_715844/3979844991.py\", line 88, in evaluate_pipeline\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m     preprocessing_pipeline(data)\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m   File \"/home/zyxing/dance/dance/pipeline.py\", line 128, in __call__\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m     return self.functional(*args, **kwargs)\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m   File \"/home/zyxing/dance/dance/pipeline.py\", line 247, in bounded_functional\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m     a(*args, **kwargs)\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m   File \"/home/zyxing/dance/dance/pipeline.py\", line 128, in __call__\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m     return self.functional(*args, **kwargs)\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m   File \"/home/zyxing/dance/dance/utils/wrappers.py\", line 128, in new_call\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m     return original_call(self, data, *args, **kwargs)\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m   File \"/home/zyxing/dance/dance/transforms/cell_feature.py\", line 177, in __call__\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m     cell_feat = pca.fit_transform(feat)\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m                 ^^^^^^^^^^^^^^^^^^^^^^^\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m   File \"/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/sklearn/utils/_set_output.py\", line 157, in wrapped\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m     data_to_wrap = f(self, X, *args, **kwargs)\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m   File \"/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/sklearn/base.py\", line 1152, in wrapper\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m     return fit_method(estimator, *args, **kwargs)\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m   File \"/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/sklearn/decomposition/_pca.py\", line 460, in fit_transform\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m     U, S, Vt = self._fit(X)\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m                ^^^^^^^^^^^^\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m   File \"/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/sklearn/decomposition/_pca.py\", line 512, in _fit\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m     return self._fit_truncated(X, n_components, self._fit_svd_solver)\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m   File \"/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/sklearn/decomposition/_pca.py\", line 592, in _fit_truncated\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m     raise ValueError(\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m ValueError: n_components=466 must be strictly less than min(n_samples, n_features)=466 with svd_solver='arpack'\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m \n",
      "[INFO][2025-08-20 12:27:02,289][dance][wandb_sweep] \u001b[94m\n",
      "\n",
      "\t[*] Sweep ID: cyuki0fw\n",
      "\u001b[0m\n",
      "[INFO][2025-08-20 12:27:02,289][dance][wandb_sweep_agent] Spawning agent: sweep_id='cyuki0fw', entity='xzy11632', project='dance-dev', count=2\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Create sweep with ID: cyuki0fw\n",
      "Sweep URL: https://wandb.ai/xzy11632/dance-dev/sweeps/cyuki0fw\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\u001b[34m\u001b[1mwandb\u001b[0m: Agent Starting Run: jhryorbj with config:\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \tparams.0.FilterGenesPercentile.max_val: 98\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \tparams.0.FilterGenesPercentile.min_val: 4\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \tparams.0.FilterGenesPercentile.mode: sum\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \tparams.1.ColumnSumNormalize.eps: 0.1\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \tparams.1.ColumnSumNormalize.mode: minmax\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \tparams.2.FilterGenesRegression.method: scmap\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \tparams.2.FilterGenesRegression.num_genes: 7435\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \tparams.3.CellSVD.algorithm: arpack\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \tparams.3.CellSVD.n_components: 793\n",
      "Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "2a3d88f01aae494eba0f1f231adc5ab2",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "VBox(children=(Label(value='Waiting for wandb.init()...\\r'), FloatProgress(value=0.011112662653128305, max=1.0…"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "wandb version 0.21.1 is available!  To upgrade, please run:\n",
       " $ pip install wandb --upgrade"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "Tracking run with wandb version 0.16.3"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "Run data is saved locally in <code>/home/zyxing/dance/examples/tuning/custom-methods/wandb/run-20250820_122705-jhryorbj</code>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "Syncing run <strong><a href='https://wandb.ai/xzy11632/dance-dev/runs/jhryorbj' target=\"_blank\">clear-sweep-1</a></strong> to <a href='https://wandb.ai/xzy11632/dance-dev' target=\"_blank\">Weights & Biases</a> (<a href='https://wandb.me/run' target=\"_blank\">docs</a>)<br/>Sweep page: <a href='https://wandb.ai/xzy11632/dance-dev/sweeps/cyuki0fw' target=\"_blank\">https://wandb.ai/xzy11632/dance-dev/sweeps/cyuki0fw</a>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       " View project at <a href='https://wandb.ai/xzy11632/dance-dev' target=\"_blank\">https://wandb.ai/xzy11632/dance-dev</a>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       " View sweep at <a href='https://wandb.ai/xzy11632/dance-dev/sweeps/cyuki0fw' target=\"_blank\">https://wandb.ai/xzy11632/dance-dev/sweeps/cyuki0fw</a>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       " View run at <a href='https://wandb.ai/xzy11632/dance-dev/runs/jhryorbj' target=\"_blank\">https://wandb.ai/xzy11632/dance-dev/runs/jhryorbj</a>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "[INFO][2025-08-20 12:27:16,705][dance][set_seed] Setting global random seed to 10\n",
      "[INFO][2025-08-20 12:27:16,707][dance][_load_dfs] Loading data from ../temp_data/train/human/human_Brain328_data.csv\n",
      "[INFO][2025-08-20 12:27:17,099][dance][_load_dfs] Loading data from ../temp_data/test/human/human_Brain138_data.csv\n",
      "[INFO][2025-08-20 12:27:17,243][dance][_load_dfs] Loading data from ../temp_data/train/human/human_Brain328_celltype.csv\n",
      "[INFO][2025-08-20 12:27:17,247][dance][_load_dfs] Loading data from ../temp_data/test/human/human_Brain138_celltype.csv\n",
      "/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/anndata/_core/anndata.py:430: FutureWarning: The dtype argument is deprecated and will be removed in late 2024.\n",
      "  warnings.warn(\n",
      "[INFO][2025-08-20 12:27:17,424][dance][_load_raw_data] Loaded expression data: AnnData object with n_obs × n_vars = 466 × 22088\n",
      "[INFO][2025-08-20 12:27:17,425][dance][_load_raw_data] Number of training samples: 262\n",
      "[INFO][2025-08-20 12:27:17,426][dance][_load_raw_data] Number of valid samples: 66\n",
      "[INFO][2025-08-20 12:27:17,427][dance][_load_raw_data] Number of testing samples: 138\n",
      "[INFO][2025-08-20 12:27:17,428][dance][_load_raw_data] Cell-types (n=9):\n",
      "['OPC',\n",
      " 'astrocytes',\n",
      " 'endothelial',\n",
      " 'fetal_quiescent',\n",
      " 'fetal_replicating',\n",
      " 'hybrid',\n",
      " 'microglia',\n",
      " 'neurons',\n",
      " 'oligodendrocytes']\n",
      "[INFO][2025-08-20 12:27:17,430][dance][load_data] Raw data loaded:\n",
      "Data object that wraps (.data):\n",
      "AnnData object with n_obs × n_vars = 466 × 22088\n",
      "    uns: 'dance_config'\n",
      "    obsm: 'cell_type'\n",
      "[INFO][2025-08-20 12:27:17,430][dance][wrapped_func] Took 0:00:00.723811 to load and process data.\n",
      "[INFO][2025-08-20 12:27:17,432][dance][_sanitize_params] Params plan:\n",
      "\u001b[92m[{'max_val': 98, 'min_val': 4, 'mode': 'sum'},\n",
      " {'eps': 0.1, 'mode': 'minmax'},\n",
      " {'method': 'scmap', 'num_genes': 7435},\n",
      " {'algorithm': 'arpack', 'n_components': 793},\n",
      " None]\u001b[0m\n",
      "[WARNING][2025-08-20 12:27:17,442][dance.FilterGenesPercentile][__call__] n_counts will be added to the var of data\n",
      "[WARNING][2025-08-20 12:27:17,448][dance.FilterGenesPercentile][__call__] n_cells will be added to the var of data\n",
      "/home/zyxing/dance/dance/transforms/filter.py:801: UserWarning: Expecting count data as input, but the input feature matrix does not appear to be count.Please make sure the input is indeed a count matrix.\n",
      "  warnings.warn(\"Expecting count data as input, but the input feature matrix does not appear to be count.\"\n",
      "[INFO][2025-08-20 12:27:17,628][dance][_filter_scmap] Start generating cell features using scmap\n",
      "[WARNING][2025-08-20 12:27:17,662][dance.CellSVD][__call__] n_components=793 must be between 0 and min(n_samples, n_features)=466 with svd_solver='full'\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "282db31ff00d4b2a8a5703f0a9882a78",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "VBox(children=(Label(value='0.011 MB of 0.011 MB uploaded\\r'), FloatProgress(value=1.0, max=1.0)))"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       " View run <strong style=\"color:#cdcd00\">clear-sweep-1</strong> at: <a href='https://wandb.ai/xzy11632/dance-dev/runs/jhryorbj' target=\"_blank\">https://wandb.ai/xzy11632/dance-dev/runs/jhryorbj</a><br/>Synced 6 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "Find logs at: <code>./wandb/run-20250820_122705-jhryorbj/logs</code>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Run jhryorbj errored:\n",
      "Traceback (most recent call last):\n",
      "  File \"/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/wandb/agents/pyagent.py\", line 308, in _run_job\n",
      "    self._function()\n",
      "  File \"/tmp/ipykernel_715844/3979844991.py\", line 88, in evaluate_pipeline\n",
      "    preprocessing_pipeline(data)\n",
      "  File \"/home/zyxing/dance/dance/pipeline.py\", line 128, in __call__\n",
      "    return self.functional(*args, **kwargs)\n",
      "           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n",
      "  File \"/home/zyxing/dance/dance/pipeline.py\", line 247, in bounded_functional\n",
      "    a(*args, **kwargs)\n",
      "  File \"/home/zyxing/dance/dance/pipeline.py\", line 128, in __call__\n",
      "    return self.functional(*args, **kwargs)\n",
      "           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n",
      "  File \"/home/zyxing/dance/dance/utils/wrappers.py\", line 128, in new_call\n",
      "    return original_call(self, data, *args, **kwargs)\n",
      "           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n",
      "  File \"/home/zyxing/dance/dance/transforms/cell_feature.py\", line 275, in __call__\n",
      "    cell_feat = svd.fit_transform(feat)\n",
      "                ^^^^^^^^^^^^^^^^^^^^^^^\n",
      "  File \"/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/sklearn/utils/_set_output.py\", line 157, in wrapped\n",
      "    data_to_wrap = f(self, X, *args, **kwargs)\n",
      "                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^\n",
      "  File \"/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/sklearn/base.py\", line 1152, in wrapper\n",
      "    return fit_method(estimator, *args, **kwargs)\n",
      "           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n",
      "  File \"/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/sklearn/decomposition/_truncated_svd.py\", line 234, in fit_transform\n",
      "    U, Sigma, VT = svds(X, k=self.n_components, tol=self.tol, v0=v0)\n",
      "                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n",
      "  File \"/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/scipy/sparse/linalg/_eigen/_svds.py\", line 438, in svds\n",
      "    args = _iv(A, k, ncv, tol, which, v0, maxiter, return_singular_vectors,\n",
      "           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n",
      "  File \"/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/scipy/sparse/linalg/_eigen/_svds.py\", line 44, in _iv\n",
      "    raise ValueError(message)\n",
      "ValueError: `k` must be an integer satisfying `0 < k < min(A.shape)`.\n",
      "\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m Run jhryorbj errored:\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m Traceback (most recent call last):\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m   File \"/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/wandb/agents/pyagent.py\", line 308, in _run_job\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m     self._function()\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m   File \"/tmp/ipykernel_715844/3979844991.py\", line 88, in evaluate_pipeline\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m     preprocessing_pipeline(data)\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m   File \"/home/zyxing/dance/dance/pipeline.py\", line 128, in __call__\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m     return self.functional(*args, **kwargs)\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m   File \"/home/zyxing/dance/dance/pipeline.py\", line 247, in bounded_functional\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m     a(*args, **kwargs)\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m   File \"/home/zyxing/dance/dance/pipeline.py\", line 128, in __call__\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m     return self.functional(*args, **kwargs)\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m   File \"/home/zyxing/dance/dance/utils/wrappers.py\", line 128, in new_call\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m     return original_call(self, data, *args, **kwargs)\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m   File \"/home/zyxing/dance/dance/transforms/cell_feature.py\", line 275, in __call__\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m     cell_feat = svd.fit_transform(feat)\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m                 ^^^^^^^^^^^^^^^^^^^^^^^\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m   File \"/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/sklearn/utils/_set_output.py\", line 157, in wrapped\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m     data_to_wrap = f(self, X, *args, **kwargs)\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m   File \"/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/sklearn/base.py\", line 1152, in wrapper\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m     return fit_method(estimator, *args, **kwargs)\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m   File \"/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/sklearn/decomposition/_truncated_svd.py\", line 234, in fit_transform\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m     U, Sigma, VT = svds(X, k=self.n_components, tol=self.tol, v0=v0)\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m   File \"/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/scipy/sparse/linalg/_eigen/_svds.py\", line 438, in svds\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m     args = _iv(A, k, ncv, tol, which, v0, maxiter, return_singular_vectors,\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m   File \"/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/scipy/sparse/linalg/_eigen/_svds.py\", line 44, in _iv\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m     raise ValueError(message)\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m ValueError: `k` must be an integer satisfying `0 < k < min(A.shape)`.\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \u001b[32m\u001b[41mERROR\u001b[0m \n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: Agent Starting Run: 9bwt95uk with config:\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \tparams.0.FilterGenesPercentile.max_val: 95\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \tparams.0.FilterGenesPercentile.min_val: 6\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \tparams.0.FilterGenesPercentile.mode: var\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \tparams.1.ColumnSumNormalize.eps: 0.5\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \tparams.1.ColumnSumNormalize.mode: normalize\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \tparams.2.FilterGenesRegression.method: scmap\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \tparams.2.FilterGenesRegression.num_genes: 1609\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \tparams.3.CellSVD.algorithm: randomized\n",
      "\u001b[34m\u001b[1mwandb\u001b[0m: \tparams.3.CellSVD.n_components: 539\n",
      "Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "a8d2975e6c5646c2b315077f5f2ba22a",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "VBox(children=(Label(value='Waiting for wandb.init()...\\r'), FloatProgress(value=0.01111244439250893, max=1.0)…"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "wandb version 0.21.1 is available!  To upgrade, please run:\n",
       " $ pip install wandb --upgrade"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "Tracking run with wandb version 0.16.3"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "Run data is saved locally in <code>/home/zyxing/dance/examples/tuning/custom-methods/wandb/run-20250820_122733-9bwt95uk</code>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "Syncing run <strong><a href='https://wandb.ai/xzy11632/dance-dev/runs/9bwt95uk' target=\"_blank\">fallen-sweep-2</a></strong> to <a href='https://wandb.ai/xzy11632/dance-dev' target=\"_blank\">Weights & Biases</a> (<a href='https://wandb.me/run' target=\"_blank\">docs</a>)<br/>Sweep page: <a href='https://wandb.ai/xzy11632/dance-dev/sweeps/cyuki0fw' target=\"_blank\">https://wandb.ai/xzy11632/dance-dev/sweeps/cyuki0fw</a>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       " View project at <a href='https://wandb.ai/xzy11632/dance-dev' target=\"_blank\">https://wandb.ai/xzy11632/dance-dev</a>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       " View sweep at <a href='https://wandb.ai/xzy11632/dance-dev/sweeps/cyuki0fw' target=\"_blank\">https://wandb.ai/xzy11632/dance-dev/sweeps/cyuki0fw</a>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       " View run at <a href='https://wandb.ai/xzy11632/dance-dev/runs/9bwt95uk' target=\"_blank\">https://wandb.ai/xzy11632/dance-dev/runs/9bwt95uk</a>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "[INFO][2025-08-20 12:27:44,260][dance][set_seed] Setting global random seed to 10\n",
      "[INFO][2025-08-20 12:27:44,262][dance][_load_dfs] Loading data from ../temp_data/train/human/human_Brain328_data.csv\n",
      "[INFO][2025-08-20 12:27:44,657][dance][_load_dfs] Loading data from ../temp_data/test/human/human_Brain138_data.csv\n",
      "[INFO][2025-08-20 12:27:44,802][dance][_load_dfs] Loading data from ../temp_data/train/human/human_Brain328_celltype.csv\n",
      "[INFO][2025-08-20 12:27:44,806][dance][_load_dfs] Loading data from ../temp_data/test/human/human_Brain138_celltype.csv\n",
      "/home/zyxing/anaconda3/envs/dance/lib/python3.11/site-packages/anndata/_core/anndata.py:430: FutureWarning: The dtype argument is deprecated and will be removed in late 2024.\n",
      "  warnings.warn(\n",
      "[INFO][2025-08-20 12:27:44,982][dance][_load_raw_data] Loaded expression data: AnnData object with n_obs × n_vars = 466 × 22088\n",
      "[INFO][2025-08-20 12:27:44,983][dance][_load_raw_data] Number of training samples: 262\n",
      "[INFO][2025-08-20 12:27:44,984][dance][_load_raw_data] Number of valid samples: 66\n",
      "[INFO][2025-08-20 12:27:44,984][dance][_load_raw_data] Number of testing samples: 138\n",
      "[INFO][2025-08-20 12:27:44,985][dance][_load_raw_data] Cell-types (n=9):\n",
      "['OPC',\n",
      " 'astrocytes',\n",
      " 'endothelial',\n",
      " 'fetal_quiescent',\n",
      " 'fetal_replicating',\n",
      " 'hybrid',\n",
      " 'microglia',\n",
      " 'neurons',\n",
      " 'oligodendrocytes']\n",
      "[INFO][2025-08-20 12:27:44,987][dance][load_data] Raw data loaded:\n",
      "Data object that wraps (.data):\n",
      "AnnData object with n_obs × n_vars = 466 × 22088\n",
      "    uns: 'dance_config'\n",
      "    obsm: 'cell_type'\n",
      "[INFO][2025-08-20 12:27:44,988][dance][wrapped_func] Took 0:00:00.726977 to load and process data.\n",
      "[INFO][2025-08-20 12:27:44,990][dance][_sanitize_params] Params plan:\n",
      "\u001b[92m[{'max_val': 95, 'min_val': 6, 'mode': 'var'},\n",
      " {'eps': 0.5, 'mode': 'normalize'},\n",
      " {'method': 'scmap', 'num_genes': 1609},\n",
      " {'algorithm': 'randomized', 'n_components': 539},\n",
      " None]\u001b[0m\n",
      "[WARNING][2025-08-20 12:27:45,000][dance.FilterGenesPercentile][__call__] n_counts will be added to the var of data\n",
      "[WARNING][2025-08-20 12:27:45,005][dance.FilterGenesPercentile][__call__] n_cells will be added to the var of data\n",
      "/home/zyxing/dance/dance/transforms/filter.py:801: UserWarning: Expecting count data as input, but the input feature matrix does not appear to be count.Please make sure the input is indeed a count matrix.\n",
      "  warnings.warn(\"Expecting count data as input, but the input feature matrix does not appear to be count.\"\n",
      "[INFO][2025-08-20 12:27:45,232][dance][_filter_scmap] Start generating cell features using scmap\n",
      "[WARNING][2025-08-20 12:27:45,256][dance.CellSVD][__call__] n_components=539 must be between 0 and min(n_samples, n_features)=466 with svd_solver='full'\n",
      "[INFO][2025-08-20 12:27:46,636][dance.CellSVD][__call__] Generating cell SVD features (466, 1609) (k=466)\n",
      "[INFO][2025-08-20 12:27:46,639][dance.CellSVD][__call__] Top 10 explained variances: [0.01398817 0.01243401 0.00987709 0.00972114 0.00953149 0.00923472\n",
      " 0.00884004 0.00859845 0.00861041 0.00851821]\n",
      "[INFO][2025-08-20 12:27:46,640][dance.CellSVD][__call__] Total explained variance: 100.00%\n",
      "[INFO][2025-08-20 12:27:46,641][dance][set_config_from_dict] Setting config 'feature_channel' to 'feature.cell'\n",
      "[INFO][2025-08-20 12:27:46,642][dance][set_config_from_dict] Setting config 'label_channel' to 'cell_type'\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "016062ed0cd14266a2e4a296eb2b6595",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "VBox(children=(Label(value='0.011 MB of 0.011 MB uploaded\\r'), FloatProgress(value=1.0, max=1.0)))"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<style>\n",
       "    table.wandb td:nth-child(1) { padding: 0 10px; text-align: left ; width: auto;} td:nth-child(2) {text-align: left ; width: 100%}\n",
       "    .wandb-row { display: flex; flex-direction: row; flex-wrap: wrap; justify-content: flex-start; width: 100% }\n",
       "    .wandb-col { display: flex; flex-direction: column; flex-basis: 100%; flex: 1; padding: 10px; }\n",
       "    </style>\n",
       "<div class=\"wandb-row\"><div class=\"wandb-col\"><h3>Run history:</h3><br/><table class=\"wandb\"><tr><td>acc</td><td>▁</td></tr><tr><td>test_acc</td><td>▁</td></tr><tr><td>train_acc</td><td>▁</td></tr></table><br/></div><div class=\"wandb-col\"><h3>Run summary:</h3><br/><table class=\"wandb\"><tr><td>acc</td><td>0.36364</td></tr><tr><td>test_acc</td><td>0.06522</td></tr><tr><td>train_acc</td><td>0.67176</td></tr></table><br/></div></div>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       " View run <strong style=\"color:#cdcd00\">fallen-sweep-2</strong> at: <a href='https://wandb.ai/xzy11632/dance-dev/runs/9bwt95uk' target=\"_blank\">https://wandb.ai/xzy11632/dance-dev/runs/9bwt95uk</a><br/>Synced 6 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "Find logs at: <code>./wandb/run-20250820_122733-9bwt95uk/logs</code>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Example main.py\n",
    "\n",
    "import argparse\n",
    "import gc\n",
    "import os\n",
    "import pprint\n",
    "import random\n",
    "import sys\n",
    "from pathlib import Path\n",
    "from typing import get_args\n",
    "\n",
    "from dance.registry import register_preprocessor\n",
    "from dance.transforms.base import BaseTransform\n",
    "import torch\n",
    "import wandb\n",
    "import numpy as np\n",
    "\n",
    "from dance import logger\n",
    "from dance.datasets.singlemodality import CellTypeAnnotationDataset  # your dataset\n",
    "from dance.pipeline import PipelinePlaner, get_step3_yaml, run_step3, save_summary_data\n",
    "from dance.utils import set_seed\n",
    "from dance.typing import LogLevel\n",
    "from sklearn.random_projection import GaussianRandomProjection\n",
    "root_path=str(Path(__file__).resolve().parent) if '__file__' in globals() else Path(\"tutorial.ipynb\").resolve().parent\n",
    "\n",
    "# Import your custom SVM class\n",
    "# In reality, you'd do: from your_svm_file import SVM\n",
    "# from your_svm_file import SVM\n",
    "\n",
    "\n",
    "def main(args=None):\n",
    "    #Step 2: Parsing Arguments and configuring hyperparameters\n",
    "    parser = argparse.ArgumentParser(formatter_class=argparse.ArgumentDefaultsHelpFormatter)\n",
    "    parser.add_argument(\"--cache\", action=\"store_true\", help=\"Cache processed data.\")\n",
    "    parser.add_argument(\"--dense_dim\", type=int, default=400, help=\"dim of PCA\")\n",
    "    parser.add_argument(\"--gpu\", type=int, default=0, help=\"GPU id, set to -1 for CPU\")\n",
    "    parser.add_argument(\"--log_level\", type=str, default=\"INFO\", choices=get_args(LogLevel))\n",
    "    parser.add_argument(\"--species\", default=\"human\")\n",
    "    parser.add_argument(\"--test_dataset\", nargs=\"+\", default=[138], type=int, help=\"list of dataset id\")\n",
    "    parser.add_argument(\"--tissue\", default=\"Brain\")  # TODO: Add option for different tissue name for train/test\n",
    "    parser.add_argument(\"--train_dataset\", nargs=\"+\", default=[328], type=int, help=\"list of dataset id\")\n",
    "    parser.add_argument(\"--valid_dataset\", nargs=\"+\", default=None, type=int, help=\"list of dataset id\")\n",
    "    parser.add_argument(\"--tune_mode\", default=\"pipeline_params\", choices=[\"pipeline\", \"params\", \"pipeline_params\"])\n",
    "    parser.add_argument(\"--seed\", type=int, default=10)\n",
    "    parser.add_argument(\"--count\", type=int, default=2)\n",
    "    parser.add_argument(\"--sweep_id\", type=str, default=None)\n",
    "    parser.add_argument(\"--summary_file_path\", default=\"results/pipeline/best_test_acc.csv\", type=str)\n",
    "    parser.add_argument(\"--root_path\", default=root_path, type=str)\n",
    "    if args is None:\n",
    "        args = parser.parse_args()\n",
    "    else:\n",
    "        args = parser.parse_args(args)\n",
    "\n",
    "    # Construct the path to the tuning config file\n",
    "    file_root_path = Path(\n",
    "        args.root_path, \"_\".join([\n",
    "            \"-\".join([str(num) for num in dataset])\n",
    "            for dataset in [args.train_dataset, args.valid_dataset, args.test_dataset] if dataset is not None\n",
    "        ])).resolve()\n",
    "    logger.info(f\"\\n files is saved in {file_root_path}\")\n",
    "\n",
    "    # Instantiate pipeline planer from config file\n",
    "    pipeline_planer = PipelinePlaner.from_config_file(f\"{file_root_path}/{args.tune_mode}_tuning_config.yaml\")\n",
    "    os.environ[\"WANDB_AGENT_MAX_INITIAL_FAILURES\"] = \"2000\"\n",
    "\n",
    "    #Step 3: define evaluation function\n",
    "    def evaluate_pipeline(tune_mode=args.tune_mode, pipeline_planer=pipeline_planer):\n",
    "        \"\"\"\n",
    "        The evaluation function used by wandb_sweep_agent.\n",
    "        It:\n",
    "        1. Loads data.\n",
    "        2. Applies the pipeline.\n",
    "        3. Trains and scores the model.\n",
    "        4: Evaluate model\n",
    "        5. Logs metric(s) to wandb.\n",
    "        \"\"\"\n",
    "        wandb.init(settings=wandb.Settings(start_method='thread'))\n",
    "        set_seed(args.seed)\n",
    "\n",
    "        # Load dataset\n",
    "        data = CellTypeAnnotationDataset(train_dataset=args.train_dataset, test_dataset=args.test_dataset,\n",
    "                                         valid_dataset=args.valid_dataset, species=args.species, tissue=args.tissue,\n",
    "                                         data_dir=\"../temp_data\").load_data()\n",
    "\n",
    "        # Preprocessing pipeline\n",
    "        kwargs = {tune_mode: dict(wandb.config)}\n",
    "        preprocessing_pipeline = pipeline_planer.generate(**kwargs)\n",
    "        preprocessing_pipeline(data)\n",
    "\n",
    "        # Retrieve training / testing data\n",
    "        x_train, y_train = data.get_train_data()\n",
    "        y_train_converted = y_train.argmax(1)\n",
    "        x_valid, y_valid = data.get_val_data()\n",
    "        x_test, y_test = data.get_test_data()\n",
    "\n",
    "        #Initialize our custom SVM model and train\n",
    "        # from your_svm_file import SVM  # Place your SVM import here\n",
    "        model = SVM(args, random_state=args.seed)\n",
    "        model.fit(x_train, y_train_converted)\n",
    "\n",
    "        #Evaluate model\n",
    "        train_score = model.score(x_train, y_train)\n",
    "        score = model.score(x_valid, y_valid)\n",
    "        test_score = model.score(x_test, y_test)\n",
    "\n",
    "        #Log results to wandb\n",
    "        wandb.log({\"train_acc\": train_score, \"acc\": score, \"test_acc\": test_score})\n",
    "        wandb.finish()\n",
    "\n",
    "    # Step 4: Run the sweep\n",
    "    entity, project, sweep_id = pipeline_planer.wandb_sweep_agent(\n",
    "        evaluate_pipeline, sweep_id=args.sweep_id, count=args.count) \n",
    "\n",
    "    #Step 5: Save summary data (top results, etc.)\n",
    "    save_summary_data(entity, project, sweep_id, summary_file_path=args.summary_file_path, root_path=file_root_path)\n",
    "\n",
    "    # Optionally, handle pipeline + parameter search steps\n",
    "    if args.tune_mode == \"pipeline\" or args.tune_mode == \"pipeline_params\":\n",
    "        get_step3_yaml(result_load_path=f\"{args.summary_file_path}\", step2_pipeline_planer=pipeline_planer,\n",
    "                       conf_load_path=f\"{Path(args.root_path).resolve().parent}/step3_default_params.yaml\",\n",
    "                       root_path=file_root_path)\n",
    "        if args.tune_mode == \"pipeline_params\":\n",
    "            run_step3(file_root_path, evaluate_pipeline, tune_mode=\"params\", step2_pipeline_planer=pipeline_planer)\n",
    "if __name__ == \"__main__\":\n",
    "    import os\n",
    "    # os.environ[\"http_proxy\"] = \"http://121.250.209.147:7890\"\n",
    "    # os.environ[\"https_proxy\"] = \"http://121.250.209.147:7890\"\n",
    "    main([])\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. Auto-Search Configuration\n",
    "\n",
    "The **configuration files** (e.g., `pipeline_params_tuning_config.yaml`, `pipeline_tuning_config.yaml`, `params_tuning_config.yaml`) guide the auto-search. Each file contains instructions for how to vary your preprocessing pipeline or model hyperparameters (or both). For example:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#pipeline_params_tuning_config.yaml\n",
    "```yaml\n",
    "type: preprocessor\n",
    "tune_mode: pipeline_params\n",
    "pipeline_tuning_top_k: 2\n",
    "parameter_tuning_freq_n: 2\n",
    "pipeline:\n",
    "  - type: filter.gene\n",
    "    include:\n",
    "      - FilterGenesPercentile\n",
    "      - FilterGenesScanpyOrder\n",
    "      - FilterGenesPlaceHolder\n",
    "    default_params:\n",
    "      FilterGenesScanpyOrder:\n",
    "          order: [\"min_counts\", \"min_cells\", \"max_counts\", \"max_cells\"]\n",
    "          min_counts: 1\n",
    "          max_counts: 134732\n",
    "          min_cells: 1\n",
    "          max_cells: 401\n",
    "  - type: normalize\n",
    "    include:\n",
    "      - ScaleFeature\n",
    "      - ScTransform\n",
    "      - Log1P\n",
    "      - NormalizeTotal\n",
    "      - NormalizePlaceHolder\n",
    "    default_params:\n",
    "      ScTransform:\n",
    "        processes_num: 8\n",
    "  - type: filter.gene\n",
    "    include:\n",
    "      # - HighlyVariableGenesLogarithmizedByMeanAndDisp\n",
    "      - HighlyVariableGenesRawCount\n",
    "      - HighlyVariableGenesLogarithmizedByTopGenes\n",
    "      - FilterGenesTopK\n",
    "      - FilterGenesRegression\n",
    "      # - FilterGenesNumberPlaceHolder\n",
    "    default_params:\n",
    "      FilterGenesTopK:\n",
    "        num_genes: 100\n",
    "      FilterGenesRegression:\n",
    "        num_genes: 100\n",
    "      HighlyVariableGenesRawCount:\n",
    "        n_top_genes: 100\n",
    "      HighlyVariableGenesLogarithmizedByTopGenes:\n",
    "        n_top_genes: 100\n",
    "  - type: feature.cell\n",
    "    include:\n",
    "      - WeightedFeaturePCA\n",
    "      - WeightedFeatureSVD\n",
    "      - CellPCA\n",
    "      - CellSVD\n",
    "      - GaussRandProjFeature  # Registered custom preprocessing func\n",
    "      - FeatureCellPlaceHolder\n",
    "    params:\n",
    "      out: feature.cell\n",
    "      log_level: INFO\n",
    "  - type: misc\n",
    "    target: SetConfig\n",
    "    params:\n",
    "      config_dict:\n",
    "        feature_channel: feature.cell\n",
    "        label_channel: cell_type\n",
    "wandb:\n",
    "  entity: xzy11632\n",
    "  project: dance-dev\n",
    "  method: grid #try grid to provide a comprehensive search\n",
    "  metric:\n",
    "    name: acc  # val/acc\n",
    "    goal: maximize\n",
    "\n",
    "\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Tips**:\n",
    "\n",
    "1. In `tune_mode=pipeline`, the system will only tune the preprocessing pipeline.  \n",
    "2. In `tune_mode=params`, the system will only tune the model parameters.  \n",
    "3. In `tune_mode=pipeline_params`, the system will do a two-stage search: first for pipelines, then for model parameters.\n",
    "\n",
    "---\n",
    "\n",
    "## 4. Testing & Execution\n",
    "\n",
    "After setting everything up:\n",
    "\n",
    "```bash\n",
    "# Search only the best preprocessing pipeline:\n",
    "python main.py --tune_mode pipeline\n",
    "\n",
    "# Search only the best model hyperparameters:\n",
    "python main.py --tune_mode params\n",
    "\n",
    "# Joint two-stage search for both pipeline and parameters:\n",
    "python main.py --tune_mode pipeline_params\n",
    "```\n",
    "\n",
    "Once this completes, you should see results logged into Weights & Biases (wandb). The save_summary_data function writes out a CSV of the top performing runs. If you selected pipeline_params, the script also generates a default param config for the second stage of the search, which is automatically run via run_step3."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5. Summary\n",
    "By following these steps:\n",
    "\n",
    "Inherit from the appropriate base class (in our case BaseClassificationMethod).\n",
    "Implement the fit, predict, and (optionally) preprocessing_pipeline methods.\n",
    "Integrate your custom model into the main.py script.\n",
    "Create and reference the necessary configuration (YAML) files.\n",
    "Run the pipeline using --tune_mode (pipeline|params|pipeline_params).\n",
    "…you can easily plug in any custom algorithm—ranging from simple classification methods like an SVM to deep learning methods with pretraining steps—into this auto-search framework.\n",
    "\n",
    "Happy coding and good luck with your hyperparameter searches!\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "dance",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}